The open-access dataset on which the analysis was conducted comes from Airbnb platform and it was downloaded via this website
It contains the data, collected from the above mentioned platform, relating to residential homes available for short-term rental from private persons in Paris.
From the analysis we want to learn the following:
Variables:
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
import plotly.express as px
import ipywidgets as widgets
from ipywidgets import interact, interact_manual
import geopandas as gpd
import math
import folium
from folium import Choropleth, Circle, Marker
from folium.plugins import HeatMap, MarkerCluster
airbnb = pd.read_csv('listings.csv')
airbnb.head(5)
| id | name | host_id | host_name | neighbourhood_group | neighbourhood | latitude | longitude | room_type | price | minimum_nights | number_of_reviews | last_review | reviews_per_month | calculated_host_listings_count | availability_365 | number_of_reviews_ltm | license | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 5396 | Explore the heart of old Paris | 7903 | Borzou | NaN | Hôtel-de-Ville | 48.85207 | 2.35871 | Entire home/apt | 100 | 2 | 263 | 2020-08-08 | 2.68 | 1 | 55 | 35 | 7510402838018 |
| 1 | 7397 | MARAIS - 2ROOMS APT - 2/4 PEOPLE | 2626 | Franck | NaN | Hôtel-de-Ville | 48.85909 | 2.35315 | Entire home/apt | 105 | 10 | 282 | 2021-09-29 | 2.28 | 1 | 233 | 13 | 7510400829623 |
| 2 | 7964 | Large & sunny flat with balcony ! | 22155 | Anaïs | NaN | Opéra | 48.87417 | 2.34245 | Entire home/apt | 130 | 6 | 6 | 2015-09-14 | 0.07 | 1 | 293 | 0 | 7510903576564 |
| 3 | 9359 | Cozy, Central Paris: WALK or VELIB EVERYWHERE ! | 28422 | Bernadette | NaN | Louvre | 48.85899 | 2.34735 | Entire home/apt | 75 | 180 | 0 | NaN | NaN | 1 | 58 | 0 | Available with a mobility lease only ("bail mo... |
| 4 | 9952 | Paris petit coin douillet | 33534 | Elisabeth | NaN | Popincourt | 48.86227 | 2.37134 | Entire home/apt | 80 | 4 | 32 | 2021-06-23 | 0.51 | 1 | 212 | 7 | 7511101582862 |
airbnb.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 49634 entries, 0 to 49633 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 49634 non-null int64 1 name 49587 non-null object 2 host_id 49634 non-null int64 3 host_name 49606 non-null object 4 neighbourhood_group 0 non-null float64 5 neighbourhood 49634 non-null object 6 latitude 49634 non-null float64 7 longitude 49634 non-null float64 8 room_type 49634 non-null object 9 price 49634 non-null int64 10 minimum_nights 49634 non-null int64 11 number_of_reviews 49634 non-null int64 12 last_review 38625 non-null object 13 reviews_per_month 38625 non-null float64 14 calculated_host_listings_count 49634 non-null int64 15 availability_365 49634 non-null int64 16 number_of_reviews_ltm 49634 non-null int64 17 license 27933 non-null object dtypes: float64(4), int64(8), object(6) memory usage: 6.8+ MB
The dataset contains 18 columns and 49634 rows. It should be noted that for some of the variables there is missing data. We're going to get more information about the missing data.
airbnb.isnull().sum()
id 0 name 47 host_id 0 host_name 28 neighbourhood_group 49634 neighbourhood 0 latitude 0 longitude 0 room_type 0 price 0 minimum_nights 0 number_of_reviews 0 last_review 11009 reviews_per_month 11009 calculated_host_listings_count 0 availability_365 0 number_of_reviews_ltm 0 license 21701 dtype: int64
Missing data appears in case of 6 columns. As it turns out, the column 'neighbourhood_group' doesn't contain any values so it's going to be removed. Moreover, other columns that were dropped are 'id', 'host_name' and 'license'.
airbnb = airbnb.drop(['id','host_name','neighbourhood_group','license','number_of_reviews_ltm'], axis=1)
airbnb.head()
| name | host_id | neighbourhood | latitude | longitude | room_type | price | minimum_nights | number_of_reviews | last_review | reviews_per_month | calculated_host_listings_count | availability_365 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Explore the heart of old Paris | 7903 | Hôtel-de-Ville | 48.85207 | 2.35871 | Entire home/apt | 100 | 2 | 263 | 2020-08-08 | 2.68 | 1 | 55 |
| 1 | MARAIS - 2ROOMS APT - 2/4 PEOPLE | 2626 | Hôtel-de-Ville | 48.85909 | 2.35315 | Entire home/apt | 105 | 10 | 282 | 2021-09-29 | 2.28 | 1 | 233 |
| 2 | Large & sunny flat with balcony ! | 22155 | Opéra | 48.87417 | 2.34245 | Entire home/apt | 130 | 6 | 6 | 2015-09-14 | 0.07 | 1 | 293 |
| 3 | Cozy, Central Paris: WALK or VELIB EVERYWHERE ! | 28422 | Louvre | 48.85899 | 2.34735 | Entire home/apt | 75 | 180 | 0 | NaN | NaN | 1 | 58 |
| 4 | Paris petit coin douillet | 33534 | Popincourt | 48.86227 | 2.37134 | Entire home/apt | 80 | 4 | 32 | 2021-06-23 | 0.51 | 1 | 212 |
We're replacing missing data in the column 'reviews_per_month' with the value 0:
airbnb.fillna({'reviews_per_month':0}, inplace = True)
airbnb.reviews_per_month.isnull().sum(0)
0
airbnb.describe()
| host_id | latitude | longitude | price | minimum_nights | number_of_reviews | reviews_per_month | calculated_host_listings_count | availability_365 | |
|---|---|---|---|---|---|---|---|---|---|
| count | 4.963400e+04 | 49634.000000 | 49634.000000 | 49634.000000 | 49634.000000 | 49634.000000 | 49634.000000 | 49634.000000 | 49634.000000 |
| mean | 9.711188e+07 | 48.863998 | 2.344889 | 130.203207 | 111.961256 | 21.131080 | 0.633449 | 13.192167 | 100.681569 |
| std | 1.179651e+08 | 0.018146 | 0.033118 | 229.383172 | 169.898643 | 45.624607 | 1.148105 | 47.381927 | 134.541931 |
| min | 2.626000e+03 | 48.812220 | 2.221440 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 |
| 25% | 1.321955e+07 | 48.850880 | 2.324130 | 60.000000 | 2.000000 | 1.000000 | 0.020000 | 1.000000 | 0.000000 |
| 50% | 3.876027e+07 | 48.865330 | 2.347985 | 90.000000 | 4.000000 | 5.000000 | 0.220000 | 1.000000 | 1.000000 |
| 75% | 1.437275e+08 | 48.878450 | 2.369280 | 140.000000 | 365.000000 | 21.000000 | 0.770000 | 2.000000 | 221.000000 |
| max | 4.263328e+08 | 48.905690 | 2.467120 | 11600.000000 | 9999.000000 | 1711.000000 | 50.920000 | 382.000000 | 365.000000 |
Firstly, we should notice the high values of standard deviation, which means there is a siginificant dispersion around the mean. Therefore the next step involves visualisation of the distributions of all numeric variables with box plots and density plots (excluding 'host_id', 'latitude' and 'longitude').
f, axes = plt.subplots(2, 3, figsize=(15, 10))
sns.boxplot(x=airbnb['price'], palette="flare", ax=axes[0, 0])
sns.boxplot(x=airbnb['minimum_nights'], palette="flare", ax=axes[0, 1])
sns.boxplot(x=airbnb['number_of_reviews'], palette="flare", ax=axes[0, 2])
sns.boxplot(x=airbnb['reviews_per_month'], palette="flare", ax=axes[1, 0])
sns.boxplot(x=airbnb['calculated_host_listings_count'], palette="flare", ax=axes[1, 1])
sns.boxplot(x=airbnb['availability_365'], palette="flare", ax=axes[1, 2])
f.suptitle("Box plots", fontsize = 30)
f.tight_layout()
plt.show()
From the box plots we can see that there are a lot of outliers.
f, axes = plt.subplots(2, 3, figsize=(15, 10))
sns.histplot(airbnb['price'], color='#D6AA8D', ax=axes[0, 0], kde=True, stat="density", linewidth=0)
sns.histplot(airbnb['minimum_nights'], color='#CC8D7A', ax=axes[0, 1], kde=True, stat="density", linewidth=0)
sns.histplot(airbnb['number_of_reviews'], color='#BB686E', ax=axes[0, 2], kde=True, stat="density", linewidth=0)
sns.histplot(airbnb['reviews_per_month'], color='#AB5A6D', ax=axes[1, 0], kde=True, stat="density", linewidth=0)
sns.histplot(airbnb['calculated_host_listings_count'], color='#974F6E', ax=axes[1, 1], kde=True, stat="density", linewidth=0)
sns.histplot(airbnb['availability_365'], color='#55335E', ax=axes[1, 2], kde=True, stat="density", linewidth=0)
f.suptitle("Density plots", fontsize = 30)
f.tight_layout()
plt.show()
After visualising distributions with density plots, it can be observed that neither of them resemble normal distribution. All plots are quite similar, with long right tails.
Therefore, the next step contains removing the outliers.
To remove outliers an IQR (interquartile range) method was used.
The IQR describes the middle 50% of values when ordered from lowest to highest. To find the interquartile range (IQR), first you need to find the median (middle value) of the lower and upper half of the data. These values are quartile 1 (Q1) and quartile 3 (Q3). The IQR is the difference between Q3 and Q1.
nazwa_kol = ['price','minimum_nights','number_of_reviews','reviews_per_month','calculated_host_listings_count','availability_365']
Q1 = airbnb[nazwa_kol].quantile(0.25)
Q3 = airbnb[nazwa_kol].quantile(0.75)
IQR = Q3 - Q1
airbnb = airbnb[~((airbnb[nazwa_kol] < (Q1 - 1.5 * IQR)) |(airbnb[nazwa_kol] > (Q3 + 1.5 * IQR))).any(axis=1)]
Additionally, we are going to remove the observations with the price that equals 0.
airbnb= airbnb.drop(airbnb[(airbnb['price']==0)].index)
Once again we are checking the descriptive statistics for the dataset, now after removing the outliers.
airbnb.describe()
| host_id | latitude | longitude | price | minimum_nights | number_of_reviews | reviews_per_month | calculated_host_listings_count | availability_365 | |
|---|---|---|---|---|---|---|---|---|---|
| count | 3.310600e+04 | 33106.000000 | 33106.000000 | 33106.000000 | 33106.000000 | 33106.000000 | 33106.000000 | 33106.000000 | 33106.000000 |
| mean | 7.938232e+07 | 48.864573 | 2.347423 | 87.854226 | 151.407298 | 9.272277 | 0.278156 | 1.129614 | 59.863167 |
| std | 1.011875e+08 | 0.019227 | 0.034204 | 46.084590 | 176.519159 | 11.899767 | 0.356374 | 0.396893 | 111.194356 |
| min | 9.412000e+03 | 48.812220 | 2.221440 | 8.000000 | 1.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 |
| 25% | 1.307353e+07 | 48.849940 | 2.325320 | 55.000000 | 2.000000 | 1.000000 | 0.010000 | 1.000000 | 0.000000 |
| 50% | 3.457965e+07 | 48.866300 | 2.349900 | 78.000000 | 7.000000 | 4.000000 | 0.140000 | 1.000000 | 0.000000 |
| 75% | 1.000057e+08 | 48.880970 | 2.373980 | 105.000000 | 365.000000 | 14.000000 | 0.410000 | 1.000000 | 69.000000 |
| max | 4.263279e+08 | 48.905690 | 2.467120 | 260.000000 | 730.000000 | 51.000000 | 1.890000 | 3.000000 | 365.000000 |
After removing the outliers, the standard deviation decreased in value. A significant difference was observed for the mean price, which dropped from 130 to almost 88 euro. The average number of reviews per apartment is 12, whereas the most popular offers have 51 of them. The mean availability is almost 60 days but there are offers with 0 availability and the full availability (= fully available throughout the entire year).
We're going to find out what neighbourhoods are present in the dataset and with what frequency they occur:
airbnb.neighbourhood.value_counts()
Buttes-Montmartre 4238 Popincourt 3479 Vaugirard 2613 Entrepôt 2504 Batignolles-Monceau 2369 Ménilmontant 2313 Buttes-Chaumont 2279 Passy 1532 Reuilly 1525 Opéra 1490 Observatoire 1364 Gobelins 1354 Temple 1139 Panthéon 945 Bourse 759 Hôtel-de-Ville 740 Luxembourg 711 Palais-Bourbon 706 Élysée 595 Louvre 451 Name: neighbourhood, dtype: int64
The most offers come from Buttes-Montmartre and Popincourt.
Now, we are going to see how the neighbourhoods are distributed on the map of Paris.
plt.figure(figsize=(14,10))
sns.scatterplot(x='longitude', y='latitude', hue='neighbourhood',s=20, data=airbnb, palette='cubehelix')
plt.title("Neighbourhoods' layout", fontsize = 30)
plt.show()
Next, we want to check what types of properties are in the dataset.
airbnb.room_type.value_counts()
Entire home/apt 28135 Private room 4645 Shared room 229 Hotel room 97 Name: room_type, dtype: int64
In the dataset, the rentals of entire home or apartment outbalance the rest. On the second position there are private rooms and later shared rooms and lastly hotel rooms.
Firstly, we are going to see the correlation heat plot for all the variables.
plt.figure(figsize=(15,8))
sns.heatmap(airbnb.corr(),annot=True, cmap='flare')
plt.title('Wykres korelacji zmiennych zbioru', fontsize = 20)
plt.show()
From the plot we can see that in the dataset there aren't present any strong correlations. The only higher value are 'reviews_per_month' and 'number_of_reviews' which is justified.
Despite the fact that there is no linear correlation between these attributes, we are going to see how these variables look like collated with division into property's type.
sns.set_style('white')
sns.color_palette("flare", as_cmap=True)
g = sns.FacetGrid(airbnb, col='room_type', sharey=False)
g.map_dataframe(sns.scatterplot, x='price', y='number_of_reviews', color="#B8656E")
g.set_axis_labels('Price', 'Number of reviews')
g.set_titles(col_template='{col_name}')
g.add_legend()
g.fig.subplots_adjust(top=0.7)
g.fig.suptitle('Number of reviews and price of the rental property', fontsize = 20);
For the first type of property it is difficult to find any relationship. For private rooms from around 130 euro per night the offers have rather less reviews, not many exceeds 30. In case of shared room, there are many observations with the very low or 0 number of reviews, in majority it pertains to offers below 100 euro. For the hotels, it's evident that the prices are the highest and it can be said that the higher the price the higher number of reviews it has.
Similarly, the variables don't show the linear correlation but we are going to see how these variables look like collated with division into property's type.
g = sns.FacetGrid(airbnb, col='room_type', sharey=False)
g.map_dataframe(sns.scatterplot, x='price', y='availability_365', color="#914D6E")
g.set_axis_labels('Price', 'Availability')
g.set_titles(col_template='{col_name}')
g.add_legend()
g.fig.subplots_adjust(top=0.7)
g.fig.suptitle('Availabilty and price of the rental property', fontsize = 20);
In this case, also the first type of property doesn't show the relationship between two variables. In case of private rooms, from the price above around 150 euro, the availability is rather higher but there are still offers with 0 availability. Shared rooms in the medium range of prices show 0 availability whereas for the cheapest the availability is various. On the other hand, hotel rooms can be characterised as the ones with high availability, however there can be found observations with 0 availability as well.
title="The average prices of rentals depending its type "
result = airbnb.groupby(['room_type'])['price'].aggregate(np.mean).reset_index().sort_values('price')
plt.figure(figsize=(15,8))
sns.barplot(x='room_type', y='price', data=airbnb, order=result['room_type'], palette="flare")
plt.title(title, fontsize = 20)
plt.xlabel('Room type')
plt.ylabel('Price')
plt.ioff()
The prices per night in hotels are the highest, they total 170 euro on average, next there are entire homes/apartments (around 90 euro), private rooms (around 60 euro) and the cheapest ones are shared rooms with the 45 euro per night averagely.
The accurate mean values collated with the average number of reviews and availability are presented below:
sample1 = airbnb.groupby("room_type").agg({
"price": [ "mean"],
"number_of_reviews": ["mean"],
"availability_365": ["mean"]})
sample1
| price | number_of_reviews | availability_365 | |
|---|---|---|---|
| mean | mean | mean | |
| room_type | |||
| Entire home/apt | 92.541070 | 9.576719 | 59.329838 |
| Hotel room | 170.185567 | 9.701031 | 260.134021 |
| Private room | 59.823681 | 7.528741 | 58.861787 |
| Shared room | 45.720524 | 7.052402 | 60.868996 |
title="The average price of rental in different neighbourhoods"
result = airbnb.groupby(['neighbourhood'])['price'].aggregate(np.mean).reset_index().sort_values('price')
plt.figure(figsize=(15,8))
sns.barplot(x='price', y='neighbourhood', data=airbnb, order=result['neighbourhood'], palette="flare")
plt.title(title, fontsize = 20)
plt.xlabel('Price')
plt.ylabel('Neighbourhood')
plt.ioff()
The most expensive neighbourhoods for rent are Louvre, Luxembourg and Élysée where the average price per night is 115 euro. On the other hand, the cheapest ones are Gobelins and Buttes-Chaumont where prices oscilate between 70 euro. The neighbourhood where the most offers are located in - Buttes-Montmartre is characterised by relatively low prices, on average slightly below 80 euro per night. It is understandable, as the more listings in the area the bigger the competition, meaning the prices have to be competitive.
The accurate mean values collated with the average number of reviews and availability are presented below:
sample2 = airbnb.groupby("neighbourhood").agg({
"price": [ "mean"],
"number_of_reviews": ["mean"],
"availability_365": ["mean"]})
sample2
| price | number_of_reviews | availability_365 | |
|---|---|---|---|
| mean | mean | mean | |
| neighbourhood | |||
| Batignolles-Monceau | 86.119038 | 7.704939 | 62.269312 |
| Bourse | 105.090909 | 11.509881 | 73.667984 |
| Buttes-Chaumont | 72.037297 | 8.250987 | 51.039930 |
| Buttes-Montmartre | 78.303445 | 8.784804 | 50.777017 |
| Entrepôt | 87.799521 | 10.222444 | 50.011581 |
| Gobelins | 76.771049 | 8.709749 | 57.299852 |
| Hôtel-de-Ville | 112.185135 | 11.124324 | 75.013514 |
| Louvre | 116.321508 | 10.534368 | 75.667406 |
| Luxembourg | 114.998594 | 10.137834 | 84.696203 |
| Ménilmontant | 71.019023 | 8.821444 | 50.850843 |
| Observatoire | 83.186217 | 9.563783 | 67.132698 |
| Opéra | 95.775839 | 9.157047 | 56.944295 |
| Palais-Bourbon | 106.237960 | 9.288952 | 66.145892 |
| Panthéon | 99.291005 | 10.489947 | 64.449735 |
| Passy | 105.293734 | 8.332898 | 96.372715 |
| Popincourt | 83.368784 | 10.049439 | 50.423972 |
| Reuilly | 80.930492 | 8.950820 | 58.501639 |
| Temple | 106.285338 | 11.339772 | 63.992976 |
| Vaugirard | 89.407960 | 8.926139 | 57.691160 |
| Élysée | 115.857143 | 8.709244 | 95.231933 |
top_rev = airbnb.nlargest(100, 'number_of_reviews')
plt.figure(figsize=(15,8))
plt.scatter(x='price', y= 'neighbourhood', c='availability_365', data=top_rev, alpha=0.5, cmap='flare')
plt.colorbar(label = 'Availability')
plt.xlabel('Price')
plt.ylabel('Neighbourhood')
plt.title('Top 100 the most popular offers', fontsize = 20)
plt.grid()
plt.show()
The most popular neighbourhoods, meaning the ones with the highest number of reviews, are Popincourt and Vaugirard. On the other hand, the least popular are Opéra and Palais-Bourbon.
The colours of the points represent the availability of the offers. Mostly, these colours are yellowish and orangish which is indicative of rather low and moderate availabity of these offers. It makes sense as these are the listings that are willingly reviewed hence it is expected for them to be in demand.
However, there exist popular rentals with high availability. It can be explained by the fairly higher prices of these offers (with some exceptions).
Nevertheless the most offers fit in the range 50-150 euro per night. It can be said that the lower the price the bigger the number of rentals with low availability.
The accurate values from above figure are presented below:
top_rev.groupby('neighbourhood').agg({'neighbourhood':["count"], 'price':['mean'], 'availability_365':['mean']})
| neighbourhood | price | availability_365 | |
|---|---|---|---|
| count | mean | mean | |
| neighbourhood | |||
| Batignolles-Monceau | 2 | 58.500000 | 0.000000 |
| Bourse | 8 | 102.375000 | 109.500000 |
| Buttes-Chaumont | 8 | 81.125000 | 92.125000 |
| Buttes-Montmartre | 9 | 69.111111 | 70.333333 |
| Entrepôt | 8 | 88.750000 | 94.125000 |
| Gobelins | 2 | 99.500000 | 27.000000 |
| Hôtel-de-Ville | 3 | 81.333333 | 122.333333 |
| Louvre | 2 | 142.500000 | 80.500000 |
| Luxembourg | 3 | 78.333333 | 0.000000 |
| Ménilmontant | 5 | 68.600000 | 115.400000 |
| Observatoire | 6 | 134.833333 | 127.666667 |
| Opéra | 1 | 70.000000 | 0.000000 |
| Palais-Bourbon | 1 | 130.000000 | 1.000000 |
| Panthéon | 3 | 127.333333 | 47.666667 |
| Passy | 4 | 104.000000 | 229.750000 |
| Popincourt | 15 | 104.733333 | 98.533333 |
| Reuilly | 3 | 53.000000 | 0.000000 |
| Temple | 2 | 108.000000 | 155.000000 |
| Vaugirard | 12 | 122.000000 | 87.000000 |
| Élysée | 3 | 130.333333 | 258.000000 |
The exact location of the most popular offers is presented on the map below.
The red icons mark the most popular tourist attractions in Paris. After clicking on the icon the name of the place is shown.
# Create the map
m_2 = folium.Map(location=[48.86, 2.35], tiles='cartodbpositron', zoom_start=12.2)
tooltip = "Click me!"
# Add points to the map
mc = MarkerCluster()
for idx, row in top_rev.iterrows():
if not math.isnan(row['longitude']) and not math.isnan(row['latitude']):
mc.add_child(Marker([row['latitude'], row['longitude']]))
m_2.add_child(mc)
folium.Marker(
location=[48.858093, 2.294694],
popup="The Eiffel Tower",
icon=folium.Icon(color="red"),
tooltip=tooltip
).add_to(m_2)
folium.Marker(
location=[48.860294, 2.338629],
popup="Musée du Louvre",
icon=folium.Icon(color="red"),
tooltip=tooltip
).add_to(m_2)
folium.Marker(
location=[48.873756, 2.294946],
popup="Arc de Triomphe",
icon=folium.Icon(color="red"),
tooltip=tooltip
).add_to(m_2)
folium.Marker(
location=[48.852737, 2.350699],
popup="Notre Dame",
icon=folium.Icon(color="red"),
tooltip=tooltip
).add_to(m_2)
# Display the map
m_2
The most of the top offers is located in some distance from the main tourist attractions in Paris. More rentals (from the most popular group) can be found in the north area of Paris.
most_booked = airbnb[airbnb['availability_365'] < 25]
# Create a base map
m_1 = folium.Map(location=[48.86, 2.35], tiles='cartodbpositron', zoom_start=12)
# Add a heatmap to the base map
HeatMap(data=most_booked[['latitude', 'longitude']], radius=10).add_to(m_1)
# Display the map
m_1
The highest density of the most booked offers (colour yellow and orange) is observed in the north of Paris, with the biggest number of rentals located in the outskirts. The reason for that is the price accessibility of the offers located there.
The accurate values are presented below:
sample3 = most_booked.groupby("neighbourhood").agg({
"neighbourhood": [ "count"],
"price": [ "mean"],
"number_of_reviews": ["mean"],
"availability_365": ["mean"]})
sample3
| neighbourhood | price | number_of_reviews | availability_365 | |
|---|---|---|---|---|
| count | mean | mean | mean | |
| neighbourhood | ||||
| Batignolles-Monceau | 1687 | 78.090101 | 6.953764 | 0.720213 |
| Bourse | 509 | 96.003929 | 10.880157 | 0.726916 |
| Buttes-Chaumont | 1713 | 67.266783 | 7.464682 | 0.746060 |
| Buttes-Montmartre | 3132 | 72.870051 | 8.208174 | 0.792465 |
| Entrepôt | 1866 | 81.980707 | 9.333869 | 0.700965 |
| Gobelins | 949 | 70.678609 | 7.851423 | 0.533193 |
| Hôtel-de-Ville | 471 | 102.861996 | 9.626327 | 0.953291 |
| Louvre | 288 | 107.177083 | 10.045139 | 0.920139 |
| Luxembourg | 422 | 102.867299 | 8.030806 | 0.796209 |
| Ménilmontant | 1705 | 66.112610 | 8.106158 | 0.700293 |
| Observatoire | 938 | 77.446695 | 8.196162 | 0.491471 |
| Opéra | 1071 | 88.211018 | 8.107376 | 0.688142 |
| Palais-Bourbon | 469 | 96.272921 | 8.618337 | 0.524520 |
| Panthéon | 644 | 91.242236 | 9.656832 | 0.818323 |
| Passy | 867 | 94.455594 | 6.765859 | 0.754325 |
| Popincourt | 2604 | 76.682028 | 9.061060 | 0.753840 |
| Reuilly | 1076 | 75.379182 | 8.208178 | 0.624535 |
| Temple | 792 | 98.325758 | 9.992424 | 1.055556 |
| Vaugirard | 1876 | 83.797974 | 7.914712 | 0.570362 |
| Élysée | 335 | 104.143284 | 6.776119 | 0.549254 |
On the map there are presented all the offers from the dataset, where a particular colour represents the price (in a given scale) and the size of dots marks the number of reviews (the bigger the dot the more reviews the offer has). Moreover, after hovering over the point the more detailed information is displayed such as the exact price, number of reviews, name of the neighbourhood, the property type and its availability.
def location_map(df, color, size, title):
fig = px.scatter_mapbox(airbnb, lat='latitude', lon='longitude',
size = size,
color = color,
size_max = 8,
zoom = 10,
height = 500,
title = title,
hover_data = {'latitude' : False,
'longitude' : False,
'neighbourhood' : True,
'price': True,
'room_type' : True,
'number_of_reviews' : True,
'availability_365' : True})
fig.update_layout(mapbox_style="open-street-map")
fig.show()
location_map(airbnb, 'price', 'number_of_reviews', 'Wizualizacja ofert na mapie Paryża')
As it could be expected, the observations located further from the city centre tend to have lower prices. The closer to the city centre the more offers with higher prices per night. The number of reviews hardly depends on localisation, the rentals with a high number of reviews can be found in the outskirts as well as in the city centre.
The advantage of this kind of visualisation is that if we have a certain localisation we're interested in, it is easy to zoom in the map and compare the offers in a given area using the information available after hovering over a particular point.
After conducting the analysis all targets were met.
To conclude: